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I. INTRODUCTION 

In the standard model (SM), the Higgs mechanism [1- 
3] is responsible for the spontaneous breaking of the 
SU(2) X U(l) gauge symmetry which generates the 
masses of the gauge bosons and more indirectly allows for 
the fermion masses. This theory predicts the existence of 
a scalar particle, the Higgs boson, which remains the only 
SM particle that has not been observed by experiment. 
Although the Higgs boson mass is not predicted by the- 
ory, direct searches done at LEP and Tevatron collider 
experiments have set limits that constrain the Higgs bo- 
son mass to be between 114.4 and 156 GeV/c^ or above 
175 GeV/(? at 95% C.L. [4, 5]. On the other hand, pre- 
cision electroweak measurements indirectly constrain its 
mass to be less than 158 GeV/c^ at 95% C.L. [6]. 
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At the Tcvatron pp collider, the Higgs boson is ex- 
pected to be produced mainly by gluon fusion, while the 
next most frequent production channel is the associated 
production of Higgs and W bosons, WH. For Higgs bo- 
son masses lower than 135 GeV/c^, the Higgs boson de- 
cay H bb has the largest branching fraction [7]. The 
production rate of bb pairs from QCD processes is many 
orders of magnitude larger than Higgs boson production, 
making the analysis of the process gg H ^ bb non- 
viable. Associated production qq WH with the W 
boson decaying leptonically gives a cleaner signal because 
requiring a lepton helps to distinguish it from the multi- 
jet QCD background [8]. 

Several searches for a low-mass Higgs boson at the 
CDF and DO experiments are combined in order to max- 
imize sensitivity [5]. In that combination, the search in 
the ivbb final state has proven to be the most sensitive 
input and therefore carries the most weight in the com- 
bination. So, optimizations in this analysis can have an 
important impact on the ultimate sensitivity of the Teva- 
tron experiments to the Higgs boson. 

Recently, the experiments at the Large Hadron Col- 
lider (LHC) have obtained enough data to produce search 
results of similar sensitivity to the Tevatron experiments 
in the low mass region [9]. However, at the LHC the 
most sensitive low mass search is in the; diphoton final 
state [10] and searches for H ^ bb will take some time 
before they reach the sensitivity of the Tevatron combi- 
nation in this channel [11]. In that sense, the Tevatron 
and LHC are quite complementary in that both will pro- 
vide important information in the search for a low-mass 
Higgs boson over the next few years. 

In this Letter, we describe a search for the Higgs boson 
in the final state where the H is produced in association 
with a W boson, the Higgs boson decays to bb, and the 
W decays to an electron or muon and its associated neu- 
trino. This final state has been investigated before by 
both Tevatron experiments, CDF and DO [12, 13]. Here 
we present a new search in a data sample correspond- 
ing to an integrated luminosity of 5.6 fb"-'^ and using an 
optimized discriminant output distribution. 

Finding evidence for Higgs boson production in asso- 
ciation with a W boson is extremely difficult since the 
expected production rate is much lower than that of 
other processes with the same final state, for example 
W + bb and top quark processes. Some of the main 
challenges of the analysis are the identification and the 
estimation of these and other background processes and 
the development of strategics to reduce their contribution 
while retaining high signal efficiency. 

The; background processes contributing to the WH fi- 
nal states are W + bb, W + cc, tt, single top, Z + jets, 
dibosons (W^W^, WZ, and ZZ), W + jets events, where 
a jet not originating from a b quark has been misidenti- 
fied as a heavy flavor jet, and non-W events where a jet is 
misidentified as a lepton. These processes have character- 
istics which differ from those of WH production that will 
be used to discriminate them from the signal. The back- 



ground rates are estimated from a combination of sim- 
ulated and observed events. To distingiiish signal from 
background events a matrix element technique [14, 15] is 
applied, in which event probability densities for the sig- 
nal and background hypotheses are calculated and used 
to create a powerful discriminator. This method was used 
as part of the observation of single top production [16] 
and many other analyses within the CDF collaboration, 
such as the measurement of the WW + WZ cross sec- 
tion [17], the measurement of the top quark mass [18], 
the search for SM Higgs boson production in the WW 
decay channel [19], and the measurement of the WW 
production cross section [20]. 

This paper is organized as follows. Section II briefly 
describes the CDF II detector [21, 22], the apparatus 
used to collect the observed events used in this analysis. 
In Section III. the identification of the particles and ob- 
servables that make up the WH final state is presented. 
Section IV describes the event selection. Identifying b 
hadrons in jets is essential, and the two algorithms used 
to identify b jets are presented in Section V. The signal 
and background signatures are discussed in Section VI 
and VII respectively, together with the method to esti- 
mate the total number of events and also the background 
composition. The matrix element method is described in 
detail in Section VIII. A discussion of systematic uncer- 
tainties is included in Section IX. Finally, in Section X 
and XI the results and conclusions of the analysis are 
presented. 



II. THE CDF II DETECTOR 

The Collider Detector at Fermilab (CDF II) [21, 22] is 

situated at one of the two collision points of the Tevatron 
pp collider. It is a general purpose detector designed to 
study the properties of these collisions. The detector has 
both azimuthal and forward-backward symmetry. Since 
the CDF II detector has a barrel-like shape, we use a 
cylindrical coordinate system (r. c6, z). The origin is lo- 
cated at the center of the detector, r is the radial distance 
from the beamline and the z-axis lies along the nominal 
direction of the proton beam (toward east) . Spherical co- 
ordinates ((/>, 9) are also commonly used, where (j) is the 
azimuthal angle around the beam axis and 6 is the polar 
angle defined with respect to the proton beam direction. 
Pseudorapidity rj is defined as = — In [tan(6'/2)]. The 
transverse energy and momentum of a particle are de- 
fined as Et = Esm9 and px = p sin ^, respectively. A 
diagram of the CDF II detector is shown in Fig. 1. A 
quadrant of the detector is cut out to expose the differ- 
ent subdetectors. 

The CDF II detector consists of three primary subsys- 
tems: The innermost part of the detector is the track- 
ing system, which contains silicon microstrip detectors 
and the Central Outer Tracker (COT), an open cell drift 
chamber, inside a superconducting solenoid which gen- 
erates a 1.4 T magnetic field parallel to the beam axis. 
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Calorimeter Intermediate Silicon Layers 

FIG. 1: A cutaway view of the CDF II detector with quadrant 
cut to expose the different subdetectors. 



These detector systems are designed to reconstruct the 
trajectories of charged particles and precisely measure 
their momenta. The silicon detectors provide excellent 
impact parameter, azimuthal angle, and z resolution [23- 
25]. For example, the typical intrinsic hit resolution of 
the silicon detector is 11 /xm. The transverse impact pa- 
rameter (distance of closest approach of a track to the 
beam line in the transverse plane) resolution is ~ 40 /im, 
of which approximately 35 /xm is due to the transverse 
size of the Tevatron interaction region. The entire sys- 
tem reconstructs tracks in three dimensions with the pre- 
cision needed to identify displaced vertices associated 
with h and c hadron decays. The COT [26] provides 
excellent curvature and angular resolution, with cover- 
age for \t]\ < 1. The COT has a transverse momentum 
resolution of ^ = 0.0015 [GeY/c]~^ which improves 

to 0.0007 [GeV/c]"i [22] including the sihcon detectors. 
The tracking efficiency of the COT is nearly 100% in the 
range |?7| < 1, and the coverage is extended to {rjl < 1.8 
by including the silicon detectors. 

Outside of the solenoid are the calorimeters [27-29], 
which measure the energy of particles that shower when 
interacting with matter. The calorimeter is segmented 
into projective towers, and each tower is divided into 
an inner electromagnetic and outer hadronic sections. 
This facilitates separation of electrons and photons from 
hadrons by the energy deposition profiles as particles 
penetrate from inner to outer sections. The full ar- 
ray has an angular coverage of |?7| < 3.6. The cen- 
tral region, < 1.1, is covered by the central electro- 
magnetic calorimeter and the central hadron calorimeter. 
The central calorimeters have resolutions of a{E)/E — 
13.5%/v'-E-sin6'©2% [GeV] and a{E)/E = 50%//E© 
3% [GeV] for the electromagnetic and hadronic calorime- 
ters, respectively. The forward region, 1.1 < jryj < 



3.6, is covered by the end-plug electromagnetic calorime- 
ter and the end-plug hadron calorimeter, with resolu- 
tion of aiE)/E = 16%/VE ® 1% [GeV] and a{E)/E = 
S0%/\/E e 5% [GeV] for the plug electromagnetic and 
hadronic calorimeters, respectively. 

Finally, outside of the calorimeters are the muon cham- 
bers, which provide muon detection in the range \ri\ < 
1.5. The muon detectors at CDF [21] make use of sin- 
gle wire drift chambers as well as scintillator counters 
for fast timing. For the analyses presented in this ar- 
ticle, muons are detected in four separate subdetectors. 
Muons with px > 1-4 GeV/c penetrating the five absorp- 
tion lengths of the calorimeter are detected in the four 
layers of planar multi-wire drift chambers of the central 
muon detector (CMU) [30]. Behind an additional 60 cm 
of steel, a second set of four layers of drift chambers, the 
central muon upgrade (CMP) [31], detects muons with 
PT > 2.2 GeV/c. The CMU and CMP cover the same 
part of the central region \ri\ < 0.6. The central muon ex- 
tension (CMX) [31] extends the pseudorapidity coverage 
of the muon system from 0.6 to 1.0 and thus completes 
the coverage over the full fiducial region of the COT. 
Muons in the [77 [-range from 1.0 to 1.5 of the forward 
region are detected by the barrel muon chambers. 



III. DATA SAMPLE AND EVENT 
RECONSTRUCTION 

The data set used in this analysis comes 
from pp collisions at a center-of-mass energy of 
^ 1.96 TeV recorded by the CDF II detector 
between March 2002 and February 2010. The CDF 
experiment utilizes a three-level trigger system [32-34] to 
reduce the 1.7 MHz beam crossing rate to ~200 Hz. The 
first two levels of the trigger system are custom hardware 
(the second level also has a software component) and 
the third consists of a farm of computers running a fast 
version of the offline event reconstruction algorithms. 

WH events in the lepton + jets channel are charac- 
terized by the presence of an electron or muon with high 
transverse energy, large missing transverse energy result- 
ing from the undetected neutrino, and two high energy b 
jets (see Fig. 2). 




FIG. 2: Feynman diagram showing the final states of the 
WH process, with leptonic W boson decays. The final state 
contains a charged lepton, a neutrino, and two b quarks. 
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The data sample used was collected by two trigger 
strategies, one based on the selection of a high trans- 
verse momentum lepton (electron or muon ^ and another 
one based on missing transverse energy (.^tj defined in 
Section HIE) + jets. 

The total integrated luminosity is 5.6 fb~^ for lepton- 
based triggered events and 5.1 fb~^ for muon candi- 
dates collected by the + .jets trigger. The differ- 
ent luminosities arise from the different detector condi- 
tions necessary for each trigger. Electrons reconstructed 
in the central and end-plug electromagnetic calorimeters 
arc referred to as CEM and PHX electrons, respectively. 
Muons reconstructed in the central region by the CMU 
and the CMP detectors are referred to as CMUP muons. 
Muons detected by the CMX detector are referred to as 
CMX muons. CEM, PHX, CMUP, and CMX leptons 
are commonly known as tight leptons and the muons col- 
lected by the ^-j, + jets trigger are known as extended 
muon coverage (EMC) muons. In this section, we briefly 
discuss the lepton identification requirements, the recon- 
struction of jets, and the calculation of 



A. Electron identification 

High-pT electrons traversing the CDF II detector are 
expected to leave a track in both the silicon detector 
and the COT. Subsequently, the electrons will deposit 
most of their energy into the central or plug electromag- 
netic calorimeters. The central electron trigger begins 
by requiring a COT track with px > 9 GeV/c that ex- 
trapolates to an energy cluster of three central electro- 
magnetic calorimeter towers with Et > 18 GeV. Sev- 
eral cuts are then successively applied in order to im- 
prove the purity of the electron selection. The recon- 
structed track with > 9 GeV/c must match to an 
electromagnetic calorimeter cluster with Et > 20 GeV. 
Furthermore, we require the ratio of hadronic energy 
to electromagnetic energy i?HAD / -E'EM to be less than 
0.055 -I- 0.00045 x E/GeY and the ratio of the energy 
of the cluster to the momentum of the track E/pc to be 
smaller than 2.0 for track momenta < 50 GeV/c. 

Electron candidates in the forward direction (|r/| > 1.1, 
PHX) are defined by a cluster in the plug electromagnetic 
calorimeter with Et > 20 GeV and i?HAD / E^m < 0.05. 
The cluster position and the primary vertex position are 
combined to form a trajectory on which the tracking al- 
gorithm utilizes hits in the silicon tracker. 

CEM candidates are rejected if an additional high-pT 
track is found which forms a common vertex with the 
track of the electron candidate and has the opposite elec- 
tric charge since these events are likely to stem from the 



Note that Icptonically decaying tau leptons make up a small 
fraction of our signal acceptance since in this case the tau can 
be identified as an isolated electron or muon. 



conversion of a photon. 

Figure 3(a) shows the (77, </>) distributions of CEM and 
PHX electron candidates. 



B. Muon identification 

Muons arc characterized by a track in the tracking sys- 
tem, energy deposited in the calorimeter consistent with 
that of a minimum ionizing particle, and in cases where 
they are fiducial to muon chambers they will often leave 
a track, called a stub, in these detectors. The third-level 
muon trigger requires a COT track with pt > 18 GeV/c 
matched to a track segment in the muon chambers. 

Muon identification requires an isolated COT track 
{Pt > 20 GeV/c) that extrapolates to a track segment 
in the muon chambers. Track segments must be de- 
tected either in the CMU and the CMP simultaneously 
(CMUP muons), or in the CMX (CMX muons) for trig- 
gered muons. Several additional requirements arc im- 
posed in order to minimize contamination from hadrons 
punching through the calorimeter, decays in flight of 
charged hadrons, and cosmic rays. The energy deposition 
in the electromagnetic and hadronic calorimeters has to 
be small, as expected from a minimum-ionizing particle. 
To reject cosmic-ray muons and muons from in-flight de- 
cays of long-lived particles such as Kg and A, the impact 
parameter of the track is required to be less than 0.2 cm 
if there are no silicon hits on the muon candidate's track, 
and less than 0.02 cm if there are silicon hits. The re- 
maining cosmic rays are reduced to a negligible level by 
taking advantage of their characteristic track timing and 
topology. 

In order to add acc;eptance for events containing muons 
which are not triggered on directly, several additional 
muon types are taken from the extended muon cover- 
age (EMC) provided by triggers based on (t + j^ts 
requirements (^x > 35 GeV and the presence of at 
least two jets). Events passing the (t + jets trigger 
are also required to have two sufficiently-separated jets: 
ARjj > 1, where AR = ^(Ar/)^ + {Acj))^. Further- 
more, one of the jets must be central, with \r]\ < 0.9, 
and jets are required to have transverse energies above 
25 GeV. These additional jet-based requirements remove 
the dependence of the trigger efficiency to jet observables 
so that it can be modeled by the (t alone. The details 
of the EMC types and selection are included in Ref. [35] . 
Figure 3(b) shows the (77, ^) distribution of all muon can- 
didates. 



C. Lepton identification efficiencies 

The efficiency of lepton identification is measured us- 
ing Z — >• e+e~ and Z — >• samples. A pure sample 
of leptons can be obtained by selecting events where the 
invariant mass of two high-px track is near the mass of 
the Z boson and one track passed the trigger and tight 
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FIG. 3: Distributions in {(j) — ri) space of the electron (a) and muon (b) selection categories, showing the coverage of the detector 
that each lepton type provides. The trigger based on $rj, plus jets is used to fill in the gaps in the muon trigger coverage. 



Lepton type Correction factor 

CEM 0.977 ± 0.001 

PHX 0.919 ± 0.002 

CMUP 0.894 ± 0.002 

CMX 0.952 ± 0.002 

EMC 0.882 ± 0.003 - 1.070 ± 0.020 



TABLE I: Correction factors applied to the Monte Carlo 
events to correct the lepton identification efficiencies. Since 
there are different sub-categories within the EMC category, 
we quote the range of variation. 

lepton identification selection. The other track can then 
be examined to see if it also passed the identification 
cuts to study the efficiency. The same procedure can 
be applied to simulated Monte Carlo (MC) events and 
to observed events in the detector and small differences 
in the efhciencies are observed due to imperfect detec- 
tor modeling. To correct for this difference, a correction 
factor is applied to the efficiencies of Monte Carlo events 
based on the ratio of lepton identification efficiencies cal- 
culated from observed events to the efficiency found in 
Monte Carlo events. The correction factors for the lep- 
ton identification are shown in Table I. 



D. Jet reconstruction and corrections 

Jets consist of a shower of particles originating from 
the hadronization of highly energetic quarks or giuons. 
Jets used in this analysis are reconstructed using a cone 
algorithm [36] by summing the transverse calorimeter en- 



ergy Et in a cone of radius AR < 0.4, for which the Et 
of each tower is calculated with respect to the primary 
vertex z coordinate of the event. The calorimeter tow- 
ers belonging to any electron candidate are not used by 
the jet clustering algorithm. The energy of each jet is 
corrected [36] for the rj dependence and the nonlinear- 
ity of the calorimeter response. The jet energies are also 
adjusted by subtracting the average extra deposition of 
energy from additional inelastic pp collisions on the same 
beam crossing as the triggered event. 

E. Missing transverse energy reconstruction 

The presence of neutrinos in an event is inferred by 
an imbalance in the transverse components of the energy 
measurements in the calorimeter. The missing Et vector 

(^t) is defined by: 

= - Y.^Tn^, (1) 

i 

where i is the index for calorimeter tower number with 
\ri\ < 3.6, and fii is a unit vector perpendicular to the 
beam axis and pointing at the i"^ calorimeter tower. 

also refers to the magnitude The calculation is 

based on uncorrected tower energies and is then corrected 
based on the jet energy corrections of all of the jets in 
the event. Also the is corrected for the muons, since 
they traverse the calorimeters without showering. The 
transverse momenta of all identified muons are added to 
the measured transverse energy sum and the average ion- 
ization energy is removed from the measured calorimeter 
energy deposits. 
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IV. EVENT SELECTION 

The selection before identifying any jet as a 6 jet is 
referred to as pretag and only requires the presence of an 
electron or muon, > 20 GeV (25 GeV in the case of 
forward electrons) and two or three jets with corrected 
Et > 20 GeV and \r]\ < 2.0. At leading order one 
would expects to have only two high-px jets in the final 
state of WH signal events. However, by allowing for the 
presence of a third jet, signal acceptance is improved by 
about 25 % due to extra jets mostly produced by gluon 
radiation in the initial or final state. 

In order to reduce the Z + jets, top, and WW/WZ 
background rates, events with more than one lepton are 
removed. If one of the leptons is not identified correctly, 
Z — > i^fr events still remain. To remove such events, 
the invariant mass of the lepton and any track with op- 
posite charge must not be in the Z boson mass window 
76 < m;,track < 106 GcV/c^. 

The non-W background consists of multijet events 
which do not contain W bosons; a description of these 
background events can be found in Section VII B. This 
non-W background is reduced by applying additional se- 
lection requirements which are based on the assumption 
that these events do not have large from an escaping 
neutrino, but rather the that is observed comes from 
lost or mismeasured jets. This requirement has been de- 
veloped in the framework of the single top observation 
and is described in detail in [37]. 



V. 6-JET TAGGING ALGORITHMS 

The events selected by the above criteria are domi- 
nated by the production of W bosons in association with 
jets. In order to improve the signal to background ra- 
tio for WH events, at least one of the jets in the event 
is required to be produced by a 6 quark. Identifying 
jets originating from b quarks helps to reduce the back- 
ground from non-VK and W + light flavor (W + LF) 
events. Therefore, the last step of the event selection is 
the requirement of the presence of at least one 6-tagged 
jet identified using the SecVtx algorithm [38]. In order 
to increase the acceptance for events with two tagged 
b jets, an additional 6-tagging algorithm that relies on 
high-impact-parameter tracks within jets. Jet Proba- 
bility [39], is used. These two tagging algorithms are 
based on the same principle: the fact that b quarks have 
a relatively long lifetime and high mass. Therefore, b 
hadrons formed during the hadronization of the initial 
b quark can travel a significant distance (on the order 
of a few millimeters) before decaying to lighter hadrons. 
Then, the displacement of the b hadron decay point can 
be detected either directly by vertexing the tracks or in- 
directly by studying the impact parameters of tracks. 



A. Secondary Vertex Tagger 

The SecVtx algorithm looks inside the jet cone to 

construct secondary vertices using tracks displaced from 
the primary vertices. The tracks are distinguished 
by their large impact parameter significance {\do/adg\), 
where do and adg are the impact parameter and its overall 
uncertainty. The tracks are fit to a common vertex us- 
ing a two-pass approach. In the first pass, applying loose 
track selection criteria (pt > 0.5 GeV/c and )-^| > 2.5), 
the algorithm attempts to reconstruct a secondary ver- 
tex which includes at least three tracks (at least one of 
the tracks must have pT > 1 GeV/c). If no secondary 
vertex is found, the algorithm uses tighter track selection 
requirements {pr > 1 GeV/c and \-^\ > 3.0) and at- 
tempts to reconstruct a two-track vertex in a second pass. 
If either pass is successful, the transverse distance (L^y) 
from the primary vertex of the event is calculated along 
with the associated uncertaintv ut , which includes the 
uncertainty on the primary vertex position. Jets are con- 
sidered as tagged by requiring a displaced secondary ver- 
tex within the jet. Secondary vertices are accepted if the 
transverse decay length significance {Lxy/crL^ ) greater 
than or equal to 7.5. 

Lxy is defined to be positive when the secondary vertex 
is displaced in the same direction as the jet, and the jet 
is positively tagged. A negative value of L^y indicates 
an incorrect 6-tag assignment due to mis-reconstructed 
tracks. In this case the tag is called negative. These neg- 
ative tags are useful for estimating the rate of incorrectly 
^-tagged jets as explained in Section V C. 



B. Jet Probability Tagger 

The Jet Probability 6-tagging algorithm is also 
used. Unlike SecVtx, this algorithm does not explicitly 
require that the tracks form a vertex. Instead, it uses 
tracks associated with a jet to determine the probability 
for these to come from the primary vertex of the interac- 
tion [39]. The calculation of the probability is based on 
the impact parameters of the tracks in the jet and their 
uncertainties. The impact parameter is assigned a pos- 
itive or negative sign depending on the position of the 
track's point of closest approach to the primary vertex 
with respect to the jet direction. It is positive (nega- 
tive) if the angle (j) between the jet axis and the line 
connecting the primary vertex and the track's point of 
closest approach to the primary vertex itself is smaller 
(bigger) than 7r/2. By construction, the probability for 
tracks originating from the primary vertex is uniformly 
distributed from to 1. For a jet coming from heavy 
flavor hadronization, the distribution peaks at 0, due to 
tracks from long lived particles that have a large impact 
parameter with respect to the primary vertex. To be con- 
sidered as tagged, the jets are required to have a value of 
the Jet Probability variable (Pj) less than 0.05 {Pj < 
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5%). 

C. Tagging efficiencies and mistag rates 

The 6-tagging efBciencies are needed to estimate the 
yields of signal and background events, which are ob- 
tained from Monte Carlo simulations. The efficiency for 
identifying a heavy flavor jet is different in simulated 
events and in observed events. It is typically overesti- 
mated by Monte Carlo models. To correct for this effect, 
a scale factor is applied to the Monte Carlo tagging effi- 
ciency. 

The method used to measure the tagging efficiency for 

heavy flavor jets is described in detail in [38]. To measure 
the tagging efficiency in observed events, a calibration 
sample enriched in heavy flavor is used. This sample 
is selected by requiring electrons with > 8 GeV/c. 
Along with the electron we require the presence of two 
jets, the "electron jet" and the "away jet" . The electron 
jet is required to have Et > 15 GeV (including the en- 
ergy of the electron) and to be within 0.4 of the electron 
in rj-cf) space (in other words the electron is within the jet 
cone), and is presumed to contain the decay products of 
a heavy flavor hadron. The away jet is required to have 
Et > 15 GeV and \ri\ <1.5, and it must be approximately 
back-to-back with the electron jet (Acj) >2 rad). To mea- 
sure the tagging efficiency of the heavy flavor electron 
jets we employ a double-tag technique, requiring that 
the away jet be tagged by the corresponding tagging al- 
gorithm. This enhances the heavy flavor fraction of the 
electron jets and reduces the dependence on the heavy 
flavor fraction. The tagging efficiency is also measured 
for simulated jets by using a Monte Carlo sample similar 
to the calibration sample. The tagging efficiency ratio 
of observed events to Monte Carlo simulated events is 
called the tagging scale factor (SF). The tagging scale 
factors used in this analysis are summarized in Table II 
for Pj < 5%, and SecVtx [40]. The uncertainties shown 
are statistical and systematic. 



TABLE II: Tagging scale factors and their uncertainties for 
Pj < 5%, and SecVtx. 





Pj 


< 


5% 


SecVtx 


Scale factor 


0.806 


± 


0.038 


0.95 ± 0.04 



The probability of misidentifying a light jet as a heavy- 
flavor jet ("mistag") is closely related to the rate of neg- 
atively tagged jets. The negative tag rate is measured in 
an inclusive-jet sample collected by triggers with various 
jet Et thresholds. This tag rate is then parametrized as a 
six-dimensional tag-rate matrix. The parametrization of 
the mistag rate is done as a function of three jet variables: 
transverse energy of the jet {Et), the number of tracks in 
the jet (iVtrk), and the pseudorapidity of the jet (ry) and 
three event variables: the sum of the transverse energies 



of all jets in the event (^ ) , the number of recon- 
structed vertices in the event (Vvtx), and the z-position 
of the primary vertex (z^tx)- These parametrized rates 
are used to obtain the probability that a given jet will be 
negatively tagged. It is assumed that the negative tags 
are due to detector resolution effects only, while positive 
tags consist of a mixture of heavy flavor tags, resolution- 
based mistags of light-flavor jets, and mistags due to /T's, 
A's and nuclear interactions with the detector material. 
The mistag rate is based on the negative tag rate in the 
inclusive jet data, corrected for estimations of the other 
contributions [40]. Typically, the mistag rate is of the 
order of a few percent. 



D. Splitting tagging categories 

As already mentioned above, the last step of the event 
selection is to require the presence of at least one 5-tagged 
jet using the SecVtx algorithm. In order to gain sensi- 
tivity, both fe-tagging algorithms arc used to assign events 
to one of three non-overlapping tagging categories, each 
with a different signal to background ratio. The Jet 
Probability tagger with the cut at 5% is less restric- 
tive than SecVtx. This means that the selection effi- 
ciency for real b jets is higher, but it is accompanied by 
an increase in the background contribiition of light jets 
misidentificd as heavy flavor jets. Some of the events 
that were not tagged by the SecVtx algorithm are re- 
covered by Jet Probability. The addition of these 
events translates into a 5% improvement in the final sen- 
sitivity of the analysis. Events are selected in the follow- 
ing order: events in which two or more jets are tagged by 
the SecVtx algorithm (SVSV events), events where only 
one jet is tagged by SecVtx and the other one is tagged 
by the Jet Probability algortihm (SVJP events), and 
events with only one jet tagged by SecVtx (in this case, 
none of the other jets is tagged by any of the two algo- 
rithms, SVnoJP events). 



VI. SIGNAL MODELING AND ACCEPTANCE 

Higgs boson events are modeled with the pythia [41] 
Monte Carlo generator using the CTEQ5l [42] parton dis- 
tribution functions (PDFs). They are combined with a 
parametrized response of the CDF II detector [43] and 
tuned to the Tevatron underlying event data [44]. 

For this analysis, the Higgs_ boson mass region 
where the branching ratio to bb is large is studied 
(Higgs boson masses between 100 and 150 GeV/c^). 
Eleven signal MC samples are generated in this range, 
100 < niH < 150 GeV/c^ in 5 GcV/c^ increments. 

The number of expected WH — )• ivibb events is given 
by: 

N = app^wH ■B{H ^bb)- £evt • 'Cint (2) 
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where Cpp^wH is the theoretically predicted cross section 
of the WH process, B is the branching ratio of a Higgs 
boson decaying to bb, Ccvt is the event detection efficiency, 
and >Cint is the integrated luminosity. 

The SM predicted cross sections for WH production 
and the branching ratios of a Higgs bosons decaying to 
bb for the different Higgs boson masses are calculated 
to next-to-leading order (NLO) [45] and are quoted in 
Table HI. 

TABLE III: SM branching ratios {H bb) and WH pro- 
duction cross sections for all Higgs boson masses used in this 
analysis. 

lliggs mass (GeV/c^) B{H bb) a (pb)' 



100 


0.812 


0.286 


105 


0.796 


0.253 


110 


0.770 


0.219 


115 


0.732 


0.186 


120 


0.679 


0.153 


125 


0.610 


0.136 


130 


0.527 


0.120 


135 


0.436 


0.103 


140 


0.344 


0.086 


145 


0.256 


0.078 


150 


0.176 


0.070 



The event detection efficiency, Sevti can be broken 
down into several factors: 

^evt ~ ^zo * ^trigger * ^lepton Id * ^tag * ^acc * 

B{W^£ue) (3) 

where each term corresponds, respectively, to the z ver- 
tex cut {\z\ < 60 cm fiduciality), triggers, lepton iden- 
tification, b tagging, acceptance requirements, and the 
branching ratio of the W boson decaying to a lepton and 
a neutrino. The event detection efficiency is estimated 
by performing the event selection on the samples of sim- 
ulated events. Control samples in the data are used to 
calibrate the efficiencies of the trigger, the lepton identi- 
fication, and the b tagging. These calibrations are then 
applied to the Monte Carlo samples we use. 

The predicted signal yields for the selected two- and 
three-jet events for each tagging category are estimated 
by Eq. 2 at each Higgs boson mass point. Tables IV 
(for two-jet events) and V (for three-jet events) show the 
number of expected WH events for each Higgs boson 
mass for an integrated luminosity of 5.6 fb. 



VII. BACKGROUND MODELING AND 
ESTIMATION 

Other production processes can mimic the WH — > 
luibb final state. The main contribution comes from 
heavy-flavor production in association with a leptonic W 
boson {Wbb, Wcc, Wc). W + LF production also gives 
a significant contribution due to mistagged jets. Smaller 



TABLE IV: Summary of predicted number of signal events 
based on 5.6 fb~^ of integrated luminosity with systematic 
and statistical uncertainties for each Higgs boson mass in 2- 
jet events passing all event selection requirements. 



Higgs mass 
(GeV/c^) 


svsv 


SVJP 


SVnoJP 


100 


5.92±0.69 


4.12±0.52 


15.66±1.23 


105 


5.50±0.64 


3.76±0.47 


14.11±1.11 


110 


4.80±0.56 


3.33±0.42 


12.34±0.97 


115 


4.06±0.48 


2.80±0.35 


10.27±0.81 


120 


3.24±0.38 


2.24±0.28 


8.08±0.64 


125 


2.65±0.31 


1.86±0.23 


6.59±0.52 


130 


2.07±0.24 


1.44±0.18 


5.12±0.40 


135 


1.49±0.17 


1.07±0.13 


3.70±0.29 


140 


1.01±0.12 


0.71±0.09 


2.46±0.19 


145 


0.70±0.n8 


0.5n±0.06 


1.69±0.13 


150 


().ii±().()5 


l).:!i=().()i 


i.l)()±l).()8 



TABLE V: Summary of predicted number of signal events 
based on 5.6 fb~^ of integrated luminosity with systematic 
and statistical uncertainties for each Higgs boson mass in 3- 
jet events passing all event selection requirements. 



Higgs mass 
(GeV/c^) 


SVSV 


SVJP 


SVnoJP 


100 


1.43±0.17 


1.10±0.15 


3.36±0.27 


105 


1.41±0.17 


1.06±0.15 


3.22±0.26 


110 


1.29±0.15 


0.98±0.13 


3.00±0.24 


115 


1.16±0.14 


0.85±0.12 


2.57±0.21 


120 


0.95±0.11 


0.71±0.10 


2.11±0.17 


125 


0.81±0.09 


0.60±0.08 


1.80±0.15 


130 


0.68±0.08 


0.49±0.07 


1.44±0.12 


135 


0.50±0.06 


0.37±0.05 


1.09±0.09 


140 


0.35±0.04 


0.26±0.04 


0.76±0.06 


145 


0.25±0.03 


0.18±0.03 


0.54±0.04 


150 


0.i()±().02 


().i2±0.()2 


().:i")±0.()3 



contributions come from electroweak and top quark pro- 
cesses, tt, single top, diboson production {WW, WZ, 
ZZ), ov Z + jets, and non-W multijet production with 
misidentified leptons. 

In order to estimate the different background rates, a 
combination of Monte Carlo samples and observed events 
are used. The observed lepton -|- jets events consist of 
electroweak, top (single top and tt), non-W production, 
and W + jets processes. Some background processes 
are estimated based on Monte Carlo simulations scaled 
to theoretical predictions of the cross section (such as 
tt); some are purely data-based (non-W); and some re- 
quire a combination of Monte Carlo and observed events 
{W + jets). The first step in the background estimate is 
to calculate the processes that can be reliably simulated 
using Monte Carlo techniques. Estimating the non-W 
fraction is the next step. Finally, the observed events 
that are not non-W, electroweak, or top quark processes 
are considered to be all W + jets events where 6-tag rate 
estimates from the Monte Carlo are used to estimate the 
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contribution to the fe-tagged signal region. Details on 
each step of this process are given in the sections below. 



A. Monte-Carlo based background processes 

Diboson events {WW, WZ and ZZ) can contribute to 
the tagged lepton + jets sample when one boson decays 
leptonically and the other decays into quarks (Fig. 4). In 
addition, top pair production in which one lepton (from 
Fig. 5 (a)) or two jets (from Fig. 5 (b)) were not recon- 
structed also constitutes an important background pro- 
cess. The diboson and tt simulated events are generated 
using the pythia [41] Monte Carlo generator. There is a 
contribution from single top quarks produced in associa- 
tion with a b quark, s-channel (Fig. 6(a)) and f-channel 
(Fig. 6(b)) single top production. These events are gen- 
erated using the madevent [46] MC, and the parton 
showering is done with PYTHIA. Finally, the Z + jets 
process in which one lepton from Z boson decay is missed 
(Fig. 7(a)) can also contribute. Z + jets production is 
simulated using a combination of ALPGEN [47] matrix el- 
ement generation and pythia parton showering. 



(a) (b) (c) 




FIG. 4: Feynman diagrams for diboson production (I^VK, 
WZ, ZZ), which provides a small background contribution 
to WH production. 



The numbers of events from these processes are pre- 
dicted based on theoretical and measured cross sections, 
the measured integrated luminosity, and the acceptances 
and tagging efficiencies derived from Monte Carlo sim- 
ulations in the same way as the WH process described 
in Section VI. The diboson cross sections are taken from 
the NLO calculations with MCFM [48]. For the Z + jets 
background, the Z -\- jets cross section times the branch- 
ing ratio of Z to charged leptons is normalized to the 
value measured by CDF [49] . Predictions based on NLO 
calculations are also used for the tt and single top back- 
ground processes [50, 51]. Top cross section predictions 
assume a top mass of 175 GeV/c^. 

The total diboson {WW, WZ, ZZ), Z + jets, ti, and 
single top quark predictions for each tagging category are 
shown in Tables VI (two-jet events) and VII (three-jet 
events) . 




FIG. 5: Feynman diagrams of the tt background process to 
WH production. To pass the event selection, these events 
must have one charged lepton (a) or two hadronic jets (b) 
that go undetected. 



(a) (b) 




FIG. 6: Feynman diagrams showing the final states of the 
s-channel (a) and t-channel (b) processes, with leptonic W 
boson decays. Both final states contain a charged lepton, a 
neutrino, and two jets, at least one of which originates from 
a h quark. 



B. Non- multijet events 

The non-T4^ background process consists of events for 
which the lepton -I- signature is not due to the decay 
of a W boson but instead have a fake isolated lepton and 
mismeasured (Fig- ^(b)). The main contribution to 
this source of background comes from QCD multijct pro- 
duction where a jet provides the signature of a lepton and 
the missing transverse energy is due to a mismeasurement 
of the jet energies. Scmilcptonic decays of h hadrons and 
misidentified photon conversions also contribute. Due to 
their instrumental nature, these processes can not be sim- 
ulated reliably. Therefore, samples of observed events arc 
used to estimate the rates of these processes and model 
their kinematic distributions. 

Three different samples of observed events are used to 
model the non-W multijet contribution. One sample is 
based on events that fired the central electron trigger but 
failed at least two of the five identification cuts of the 
electron selection requirements that do not depend on 
the kinematic properties of the event, such as the frac- 
tion of energy in the hadronic calorimeter. This sample 
is used to estimate the non-W contribution from CEM, 
CMUP and CMX events. A second sample is formed 
from events that pass a generic jet trigger with transverse 
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FIG. 7: Representative Feynnian diagrams for (a) Z + jets 
production, where one lepton is missed, and (b) non-iy 
events, in which a jet has to be misidentified as a lepton and 
.^T must be mismeasured to pass the event selection. Because 
the cross section of non-W events is large, they still form a 
significant background process. 



energy E'y > 20 GeV to model PHX events. These jets 
are additionally required to have a fraction of energy de- 
posited in the electromagnetic calorimeter between 80% 
and 95%, and fewer than four tracks, to mimic electrons. 
A third sample, used to model the non-W background in 
EMC events, contains events that are required to pass the 
.^T + jets trigger (see Section III) and contain a muon 
that passes all identification requirements but failed the 
isolation requirement. In this case, the isolation is de- 
fined as the ratio of the transverse energy surrounding the 
muon to the transverse energy of the muon. The pseudo- 
rapidity distributions of the objects chosen to model the 
falsely identified lepton must be consistent with that of 
the sample it is modeling. The first sample works well for 
central leptons, but can't cover the PHX or EMC. Highly 
electromagnetic jets work well for the PHX, while only 
non-isolated EMC muons give the correct distribution for 
EMC non-W events. 

To estimate the non-W fraction in both the pretag and 
tagged sample, the spectrum is fit to a sum of the 
predicted background shapes, as described in detail else- 
where [37] . The fit has one fixed component and two tem- 
plates whose normalizations can fioat. The fixed compo- 
nent is coming from the Monte Carlo based processes. 
The two floating templates are a Monte Carlo W + jets 
template and a non-W template. The non-W template is 
different depending on the lepton category, as explained 
above. The pretag non-W fraction is used to estimate 
the heavy flavor and light flavor fractions. 

The total non-W contribution for each tagging cate- 
gory is shown in Tables VI and VII. 

C. W -\- heavy flavor contributions 

W + heavy flavor production is the main source of 
background in the tagged lepton + jets sample. W -I- jets 
production is simulated using a combination of ALPGEN 
matrix element generation and PYTHIA parton showering 
(same as for Z + jets events). Diagrams for some of the 
sample processes included in ALPGEN are shown in Fig. 8. 

The contribution of this background is estimated using 



the heavy flavor fractions in W -I- jets production and the 
tagging efficiencies for these processes. These quantities 
are derived from Monte Carlo simulations as explained 
in [37] . The contribution of W -f- heavy flavor events to 
our signal region is calculated by: 

K%F = (NZT' ■ (1 - fZT-'w) - N^C^') -hfk- Stag, 

(4) 

where N^^^^^ is the number of observed events in the 
pretag sample, fnln-w fraction of non-W events in 

the pretag sample, as determined from the fits described 
in Section VII B, and N^q^^ is the expected number of 
pretag events in Monte Carlo based samples. The frac- 
tion of W-boson events with jets matched to heavy flavor 
quarks, fhf, is calculated from Monte Carlo simulation. 
This fraction is multiplied by a scale factor, k = 1.4±0.4, 
to account for differences between the heavy flavor frac- 
tions observed in data and the Monte Carlo prediction. 
The fc-factor is primarily calculated in the one-jet con- 
trol sample and applied to all jet multiplicities. Stag is 
the tagging selection efficiency. See Ref. [37] for more 
detail. 

(a) (b) (c) 




FIG. 8: Some representative diagrams of W -|- jets produc- 
tion. Wcc is the same process as Wbb, but with charm quarks 
replacing the b quarks. 



D. Rates of events with mistagged jets 

The other W -|- jets contribution which can mimic the 
ii^ibb final state is W -|- LF. In this case, jets from light 
partons tagged as heavy flavor jets can contribute to 
the tagged sample. We count the events in the pretag 
sample and apply a mistag matrix to calculate the frac- 
tion of W -|- light flavor events that will be mistagged 
(-^mistagZ-^protag)- Thc mistag rate paramctrization is 
described in Section VC. Then, in order to only use 
mistagged events from W-I-LF processes, we subtract the 
fraction of pretag events which are due to non-W, clcc- 
troweak, top quark and W + heavy flavor processes from 
the pretag sample. The predicted number of background 
events from W -|- LF processes is then calculated as: 

»7-tag _ /»7-pretag /,_/.prctag \ _ ^rprctag _ Arprctag \ -^mistag 

"'pretag 

(5) 
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The total Wbb, Wcc/Wc, and W + LF contributions 
for each tagging category are shown in Tables VI and VII. 



E. Summary of background estimation 

The contributions of individual background sources 
have been described in this section. The summary of the 
background and signal (mn = 115 GeV/c^) estimates 
and the number of observed events arc shown for the 
three different tagging categories in Tables VI and VII. 
The numbers of expected and observed events are also 
shown in Fig. 9 as function of jet multiplicity. In these 
tables and plots, all lepton types are combined. In gen- 
eral, the numbers of expected and observed events are 
in good agreement within the uncertainties on the back- 
ground predictions. 

TABLE VI: Summary of predicted numbers of signal 
(m^r = 115 GeV/c^) and background W + 2 jets events 
passing all the event selection requirements with systematic 
and statistical uncertainties. The total numbers of observed 
events passing the event selection are also shown. 



Process 


SVSV 


SVJP 


SVnoJP 


WW 


0.9±0.2 


3.3±1.3 


106±13 


WZ 


8.3±1.2 


6.2±1.0 


35.1±3.9 


ZZ 


0.30±0.05 


0.3±0.1 


1.4±0.2 


ti (lepton+jets) 


47.0±7.8 


37.6±6.8 


205±29 


tt (dilepton) 


28.2±4.6 


20.0±3.4 


77±11 


Single top (t-channel) 


6.3±1.1 


6.3±1.3 


116±17 


Single top (s-channel) 


26.2±4.3 


18.4±3.1 


66.0±9.1 


Z+jets 


4.2±0.7 


5.1±1.3 


80±12 


Wbb 


142±46 


121±39 


978d=295 


Wcc/Wc 


13.8±4.7 


46±17 


959±296 


W + LF 


4.7±1.5 


19±11 


946±138 


Non-W^ 


19.0±7.6 


29±12 


298±119 


Total prediction 


301±53 


312±59 


3869±619 


WH (115 GeV/c'') 


4.06±0.48 


2.80±0.35 


10.27±0.81 


Observed 


282 


311 


3878 



F. Validation of the background model 

Since the analysis described here relies on Monte Carlo 
simulation, the result depends on the proper modeling of 
the signal and the background processes. For that reason, 
the prediction of the background model is compared with 
the observed events for hundreds of distributions in the 
signal region and in different control regions. Figs. 10, 11, 
and 12 show examples of validation plots for two and 
three jet bins, in a control region with no 6-tagged jets 
(to check the W + LF shapes) and in the signal region 
with at least one tagged jet. In general, the agreement is 
good. The lepton and jet transverse energy distributions 
are the least well modeled. To check the effect of this 



TABLE Vll: Summary of predicted numbers of signal 
(mn = 115 GeV/c?) and background -|- 3 jets events 
passing all the event selection requirements with systematic 

and statistical uncertainties. The total numbers of observed 
events passing the event selection arc also shown. 





SVSV 


SVJP 


SVnoTP 


WW 


1 n-i-n 9 






WZ 


2.3±0.3 


1.9±0.4 


9.4±1.1 


ZZ 


0.19±0.03 


0.15±0.03 


0.6±0.1 


tt (lepton-f-jets) 


188±31 


161±29 


504±70 


tt (dilepton) 


25.4±4.1 


18.2±3.1 


57.6±8.0 


Single top (t-channel) 


5.6±0.9 


5.0±0.9 


26.1±3.7 


Single top (s-channel) 


8.9±1.5 


6.8±1.2 


19.5±2.7 


Z-fjets 


3.0±0.5 


4.0±1.1 


29.7±4.4 


Wbb 


49±16 


47±16 


258±78 


Wcc/Wc 


7.1±2.5 


22.9±8.6 


237±73 


W + LF 


3.2±1.1 


11.3±5.9 


255±38 


Non-VF 


9.6±3.9 


21.5±8.6 


93±37 


Total prediction 


303±39 


303±42 


1522±177 


WH (115 GeY/c^) 


1.16±0.14 0.85±0.12 


2.57±0.21 


Observed 


318 


302 


1491 



mismodeling we derive weights from the lepton and jet 
transverse energies in the control region, and we have 
applied them to the discriminant variable in the signal 
region. We check the effect of each variable one at a time 
by calculating the expected limits in each case and found 
that the effect on the result was not significant. The 
validation of the modeling of other observable quantities 
is shown later in this paper. 

VIII. MATRIX ELEMENT METHOD 

The number of expected signal events after the ini- 
tial selection is much smaller than the uncertainty in 
the background prediction. For example, for a Higgs 
boson mass of 115 GcV/c^ the signal-to-background ra- 
tio is at best only about 1/70 even in the most signal 
rich 6-tagging categories. Thus, a method based only on 
counting the total number of events is unsuitable. The 
invariant mass distribution of the two leading jets in the 
event is the most powerful variable for discriminating sig- 
nal from background, but it is limited by the jet energy 
rcsohition. Figure 13 shows the invariant mass distri- 
bution of the two leading jets for two-jet SVSV events. 
Further discrimination between signal and background is 
needed. 

A matrix clement (ME) method [14, 15] is used in this 
search to discriminate signal from background events. 
This multivariate method relies on the evaluation of event 
probability densities (commonly called event probabili- 
ties) for signal and background processes based on calcu- 
lations of the relevant standard model differential cross 
sections. The ratio of signal and background event prob- 
abilities is then used as a discriminant variable called the 
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FIG. 9: The predicted and observed number of events for lepton + jets events. The observed events are indicated with points, 
and the shaded histograms show the signal and background predictions which are stacked to form the total prediction. Other 
is the sum of the WW, WZ, ZZ, and Z+]ets contributions and W^+HF is the sum of the Wbb, Wcc, and Wc contributions. 
From left to right: SVnoJP, SVJP, and SVSV events. 



event probability discriminant, EPD. The goal is to max- 
imize sensitivity through the use of all kinematic informa- 
tion contained in each event analyzed. The discriminant 
distributions are optimized separately for each Higgs bo- 
son mass hypothesis in order to extract the maximum 
sensitivity. Using the EPD as the discriminant variable 
leads to an increase in sensitivity of ~20% with respect 
to only using the invariant mass distribution of the two 
leading jets in the event. 



A. Event probability 

If we could measure the four- vectors of the initial and 
final state particles precisely, the event probability would 
be: 



P ^ — 

^ evt f 

where the differential cross-section is given by [52]: 
(2^)4|X|2 



(6) 



da 



d^niqi + q2]Pi, -yPn) (7) 



where A4 is the Lorentz-invariant matrix element; qi, q2 
and , ruq^ are the four momenta and masses of the 
incident particles; pi — Pn are the four momenta of the 
final particles, and (i$„ is the n-body phase space given 
by [52]: 



d^„=5\q,+q2-Y^p^)X{ 



d^Pi 



(8) 



However, several effects have to be considered: (1) the 
partons in the initial state cannot be measured, (2) neu- 
trinos in the final state are not measured directly, and (3) 
the energy resolution of the detector can not be ignored. 



To address the first point, the differential cross section 
is weighted by parton distribution functions. To address 
the second and third points, we integrate over all particle 
momenta which we do not measure (the pz of the neu- 
trino), or do not measure well, due to resolution effects 
(the jet energies) . The integration gives a weighted sum 
over all possible parton-level variables y leading to the 
observed set of variables x measured with the CDF de- 
tector. The mapping between the particle variables y and 
the measured variables x is established with the transfer 
function W{y,x), which encodes the detector resolution 
and is described in detail in Section VIII B. Thus, the 
event probability now takes the form: 



P{x) = ^J daiy)dqidq2f{yi)f{y2)W{y,x), 



(9) 



where da{y) is the differential cross section in terms of 
the particle variables; f{yi) are the parton distribution 
functions, with yi being the fraction of the proton mo- 
mentum carried by the parton [yi = EqJ Ebeam)] and 
Wly, x) is the transfer function. Substituting Eqs. 7 and 
8 into Eq. 9, and considering a final state with four par- 
ticles (ri=4), transforms the event probability to: 

P{x) ^- I 2AM\' {^}!'\{ ^^'\ W{y, x)d^,dEq,dEq, , 

J \Eqt\ \Eq^\ 

(10) 

where the masses and transverse momenta of the ini- 



tial partons are neglected (i.e., y {qi ■ (72)^ — 'm-q-^m'^^ ~ 

2Eq^Eq^). 

The squared matrix element \A4\'^ for the event prob- 
ability is calculated at leading order by using the HELAS 
(Helicity Amplitude Subroutines for Feynman Diagram 
Evaluations) package [53]. The subroutine calls for 
a given process are automatically generated by mad- 
GRAPH [46]. For events with two jets, event probability 
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FIG. 10: Validation plots comparing observed events and Monte Carlo distributions for basic kinematic quantities for events 
with two (a-f) and three (g-1) jets and no b tags. The observed events are indicated with points. 



densities for the WH signal (for 11 Higgs boson masses), 
as well as for the s-channel and t-channel single top, tt, 
Wbb, Wcc, Wc, mistags {Wgj, and Wgg) and diboson 
{WW, WZ) background processes are calculated. The 



WH channel is mainly produced in two-jet events, but it 
can happen that an initial or final state radiation jet is 
identified as the third jet of the event. Including three-jet 
events increases signal acceptance and gains sensitivity to 
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FIG. 11: Validation plots comparing observed events and Monte Carlo distributions for basic kinematic quantities for events 
with two (a-f) and three (g-1) jets and at least one b tag. The observed events are indicated with points. 



the Higgs boson signal. In the case of events with three 
jets in the final state, event probability densities for the 
WH signal, as well as for the s-channel and t-channel sin- 
gle top, tt, Wbb, and Wcc processes are calculated. The 



WH Feynman diagrams include only those with initial 
and final state radiation, and exclude those in which a 
ggH coupling is present as these contribute less than 1% 
to the total cross section, but increase the computation 
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FIG. 12: Validation plots comparing observed events and Monte Carlo distributions for missing transverse energy for events 
with two (a and c) and three jets (b and d), with no b tags (top) and with at least one b tag (bottom). The observed events 
are indicated with points. 



W + 2 Jets, SVSV 




Dijet Invariant Mass [GeV/c^ 

FIG. 13; Invariant mass distribution of the two leading jets 
for 2-jet SVSV events. The Higgs boson signal contribution 
(Mh — 115 GeV/c^) is multiplied by a factor 5 to make it 
visible. 



time by more than 20%. 

The integration performed in the matrix element calcu- 
lation of this analysis is identical to the one for the search 
for single top production [37]. The matrix elements cor- 



respond to fixed-order tree-level calculations and thus are 
not perfect representations of the probabilities for each 
process. This limitation of the matrix element calcula- 
tions for the discriminant affects the sensitivity of the 
analysis but not its correctness, as the same matrix ele- 
ments are calculated for both observed and Monte Carlo 
events, which uses parton showers to approximate higher- 
order effects on kinematic distributions. The different 
combinations of matching jets to quarks are also consid- 
ered [54]. 

A data-MC comparison of the measured four vectors 
can be found in Figs. 14 and 15. This comparison is done 
in the control (0 tag) and signal (> 1 tag) regions. In 
general, good agreement between observed data and MC 
expectation is found. 



B. Transfer functions 

The transfer function W{y,x) gives the probability of 
measuring the set of observable variables x given specific 
values of the parton variables y. In the case of well- 
measured quantities, W{y,x) is taken as a (5-function 
(i.e., the measured momenta are used in the differential 
cross section calculation). When the detector resolution 
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FIG. 14: Validation plots comparing observed and MC simulated events for the four- vector (E, Px, Py, Pz) of the lepton and 
the jets in 2-jet untagged events. 



cannot be ignored, W{y,x) is a parametrized resolution 
function based on fully simulated Monte Carlo events. 
For unmeasured quantities, such as the three components 
of the momentum of the neutrino, the transfer function 



is constant. The choice of transfer function affects the 
sensitivity of the analysis but not its correctness, since 
the same transfer function is applied to both observed 
and Monte Carlo events. 
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FIG. 15: Validation plots comparing observed and MC simulated events for the four- vector (E, Px, Py, Pz) of the lepton and 
the jets in events with 2-jets and at least one &-tagged jet. 



Lepton energies are measured well by the CDF de- 
tector and (5-functions are assumed for their transfer 
functions. The angular resolution of the calorimeter 
and muon chambers is also sufficient and (5-functions are 



also assumed for the transfer function of the lepton and 
jet directions. The resolution of jet energies, however, 
is broad and it is described by a jet transfer function 
W^jot(^parton,-Ejot). Usiug thcsc assumptious, W{y,x) 
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takes the following form for the four final state parti- 
cles considered in the WH search (lepton, neutrino and 
two jets): 

2 2 

w{y, x) = s^py~pn n s'i^"! - n ^^(^p'^ ' 

i=l k=l 

(11) 

where p^^ and are the produced and measured lep- 
ton momenta, Vif and f2f are the produced quark and 
measured jet angles (cos8, (/)), and Ep,. and Ej,. are the 
produced quark and measured jet energies. 



The jet energy transfer functions map parton energies 
to measured jet energies after correction for instrumental 
detector effects [36] . This mapping includes effects of ra- 
diation, hadronization, measurement resolution, and en- 
ergy outside the jet cone not included in the reconstruc- 
tion algorithm. The jet transfer functions are obtained by 
parametrizing the jet response in fully simulated Monte 
Carlo events. The distributions of the difference between 
the parton and jet energies, Se = (-Eparton — -Ejct), are 
parametrized as a sum of two Gaussian functions: 



W^jct(-Eparton, E- 



1 



■jot J 



^{P2 + P3P5) 



exp ■ 



P3 exp- 



-{SE-Pif 

'^pI 



(12) 



r 



one to account for the sharp peak and the other one to 
account for the asymmetric tail, because the 5e distri- 
butions (shown in Fig. 16 for different flavor jets) are 
asymmetric and features a significant tail at positive 5e- 



I I bottom (from Wbb) 
-•- light (from Wjg) 
-■- gluon (from Wjg) 
-A- charm (from Wcg) 




E - E [GeV] 



FIG. 16: Normalized 5e = (S'parton — Ejct) distributions for 
jets matched to partons in WH with a Higgs boson mass 
of 115 GeV/c^ (6-jets), Wjg (light-jets and gluons), and Wcg 
(c-jets) Monte Carlo events (passed through full detector sim- 
ulation) . 



Different transfer functions are created depending on 
the physics process and the flavor of the jet due to the 
different kinematics as shown in Fig. 16. To take into 
account the different kinematics of the physics processes 
used in this analysis {WH [100-150] GeV/c^, Wbb, tt, s- 
channel and t-channel single top, Wcc, Wcg, Wjg, Wgg, 
WW, and WZ) and the different flavor of jet (5, c, light 
and gluons), 23 different transfer functions are created as 
explained below. 

One of the novelties of this analysis is that, in order 



to better reproduce the parton energy (i^parton)) a neural 
network output (Onn) is used instead of the measured jet 
energy (-Ejet)- This output distribution is not a neural 
network output event classifier distribution, but rather 
a functional approximation to the parton energy. So 
Wjct(£^parton,E^jct) IS Substituted for Wjot(£^parton, Onn), 
and it is commonly referred as a neural network trans- 
fer function (or NN TF). The Onn used in the analy- 
sis is the result of training neural networks (NNs) using 
the Stuttgart neural network simulator (SNNS) [55]. For 
each physics process considered, a different NN is con- 
structed for each type of jet in that process as shown in 
Table VIII. By using the jets from the specific process to 
train the NN it is assured that the NN is optimized for 
the kinematics of the jets associated with that process. 



TABLE VIII: Types of jets used to train the different NNs 
for each process. 



Process 



b jets c jets light jets gluons 



WH {11 ruH values) X 
Wbb X 
Wcc 

ti X 

s-channel X 

f-channel X 

Wcg 

Wjg 

Wgg 

WW -WZ 



X 



X 



X 



X 



X 



X 
X 
X 



The training of the NNs is based on MC simulated 
events. The MC events used for the trainings are the re- 
maining events after applying the analysis event selection 
(see Section IV) and the jets are required to be aligned 
within a cone of Ai? < 0.4 with the closest flavored par- 
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ton {b or c depending on the physics process) coming 
from the hard scattering process. 

All the NN trainings have the same architecture and 
input variables. Seven input variables related to the jet 
kinematics have been used: the total corrected energy of 
the jet (E), the raw (measured) transverse momentum 
of the jet (pt), the azimuthal angle of the jet (cf)), the 
pseudorapidity of the jet (ry), the raw (measured) energy 
of the jet, the total corrected energy of the jet in a cone of 
radius R <0.7 (E cone 0.7), and the sum over the tracks 
in the jet of the ratio of the transverse momentum of the 
track and the sine of the 9 of the track {J2p)- 

Figure 17 shows the data-MC comparison of the seven 
input variables for the leading jet in two-jet events where 
at least one of the jets has been tagged by SecVtx which 
also validates the MC expectations in this signal region. 



W + 2 Jets, >1 b Tag 




W + 2 Jets, >1 b Tag 



V 400 



E 200 



50 100 150 200 250 
1"ietE[GeV] 

+ 2 Jets, >1 b Tag 




i 

n 
U 










0- 


r 




0- 






oL 







50 100 

1" jet Pt [GeV/c] 

W + 2 Jets, >1 b Tag 



1"jeti|> 





J 000 1 



! 500 



50 100 150 200 
l"ietRawE[GeV] 

W + 2 Jets. >1 b Tag 




50 100 150 200 
1" jet E cone 0.7 [GeV] 




100 150 200 
1"ietSumP [GeV] 

FIG. 17: Validation plots comparing observed and Monte 
Carlo simulated events for the seven input variables of the 
neural network transfer function for the first leading jet for 
events with two jets and at least one b tag. The observed 
events are indicated with points. 

Figure 18 shows the difference between the parton en- 
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FIG. 18: Difference between the parton energy and the mea- 
sured jet energy (empty histogram) and the Onn (dashed his- 
togram) for 6-jets in WH (mn ~ 115 GeV/c^) events (a), 
6-jets in Wbb events (b), light jets in diboson {WW, WZ) 
events (c) and Wgg events for gluon (d). 



ergy and the corrected jet energy and between the parton 
energy and the Onn for four different physics processes, 
WH, diboson {WW, WZ), Wbb, and Wgg. In aU cases 
the average Onn is closer to the parton energy than the 
average corrected jet energy and that the distributions 
are more narrow. Therefore, since the Onn provides a 
better jet resolution, using it as an input to the transfer 
function should help to improve the performance of the 
transfer function. 

The functional form used to parametrize i?parton-ONN 
is the same as the one described above for Se (Eq. 12). 
More details on the performance of the NN TF can be 
found in Ref. [56]. 

The output of the neural network is used to correct the 
measured energy of all the jets from the events that pass 
the analysis selection. As a cross-check, a comparison of 
the invariant mass resolution of the dijet system in WH 
signal events before and after applying this correction is 
performed. A way to do this is to fit the invariant mass 
distribution to a Gaussian function and compare the res- 
olution, defined as the sigma divided by the mean of the 
fit, for all Higgs boson masses. The results are shown 
in Fig. 19 (left). As expected, the invariant mass reso- 
lution is better (smaller sigma) after correcting by the 
Onn- The linearity of the correction is also checked, see 
Fig. 19 (right). Both functions are linear. The only dif- 
ference is that the reconstructed invariant mass is closer 
to the generated one once the correction is applied. 
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which the distributions of signal events and background 
events are maximally different. 



FIG. 19: Left (right): Relative resolution, sigma divided by 
the mean of the Gaussian fit to the invariant mass distribu- 
tions, as a function of the invariant mass (reconstructed vs 
generated invariant mass) before and after applying the NN 
correction to the measured jets. 



C. Event probability discriminant 

The event probability densities are used as inputs to 
build an event probability discriminant, a variable for 



An intuitive discriminant which relates the signal and 
background probability densities is the ratio of sig- 
nal probability over signal plus background probability, 
EPD = Psignai/{Psignai + Pbackground) ■ By Construc- 
tion, this discriminant is close to zero for background- 
like events {Pbackground > Psignai) and close to Unity for 
signal-like events {Psignai > Pbackground)- Expressions 
13 and 14 are the definitions of the event probability 
discriminants used in this analysis for single and double 
6-tagged events, respectively: 



EPD ^ . . . ^ ^ ^ ^ (13) 

h {PWH + Pwbb +Pti + Ps+Pt] + {1- b){Pwcc + Pwcj + Pw + l + Pwgg + Pdib) 

EPD ^ ^ . ^ ^ , . ^ ^ (14) 

blb2{PwH + Pwbb + Ptt + Ps) + bl(l - b2)Pt + (1 - bl)(l - b2){Pwc5 + Pwcj + Pw + l + Pwgg + Pdrb) 



where Pi = Ci - Pi, Pi is the event probability of a given 
physics process {WH, s-channel, Wbb, ...), Ci are addi- 
tional coefficients (to be defined below), and b (defined 
as the 6-jet probability) is a transformation of the output 
of the neural network jet flavor separator (6nn) [37, 57]. 

Extra non-kinematic information is introduced into the 
event probability discriminant by using 6nn , and Ci . The 
Ci coefficients are included into the EPD and used to 
optimize the discrimination power between signal and 
background. This set of coefficients is obtained by an 
iterative technique that involves the repeated generation 
of different sets of parameters and the computation of 
the expected limit for each set. However, because the 
calculation of limits with the inclusion of systematic un- 
certainties is computationally intensive, the optimization 
is implemented by performing a faster calculation for a 
figure of merit based only on statistical uncertainties. 
This has been successfully used in previous versions of 
this analysis and in the most recent measurement of the 
WW + WZ production cross section [17]. 

For any given set of coefficients Ci, the Monte Carlo 
templates of the EPD variable are generated normalized 
to the corresponding number of expected signal and back- 
ground events calculated in Section VII. The figure of 
merit is obtained from these templates using a maximum 
likelihood fit to extract ^ and its error cr^, where ^ is a 
multiplicative factor to the expected WH cross section. 



The negative logarithm of the likelihood used is: 



iog(/:(0) 



i^Sk) 



(15) 



where Sk and B^. are the expected number of signal and 
background events in the k*^ bin and ASk and ABk are 
the statistical uncertainty on Sk and B^, respectively. 
The variable ^ represents the most likely value of sig- 
nal, in units of the expected signal cross section, that 
can be fitted on the background templates and should 
be always close to zero after the minimization. The er- 
ror on the value of ^ is obtained from the minimization 
and is related to the strength by which the signal can be 
differentiated from the background templates in units of 
the expected signal cross section; the larger the error the 
smaller the strength and vice versa. For each set of EPD 
templates the figure of merit is defined as l/uj. 

The best set of coefficients is then obtained using an 
iterative technique, where at the beginning the current 
best set of coefficients is initially set to the maximum 
matrix element probability values obtained in the respec- 
tive samples. For every iteration a trial set of coeffi- 
cients is formed by introducing random changes in some 
of the coefficients from the current best set, creating new 



23 



EPD templates and calculating the corresponding figure 
of merit of these new EPDs. The set of coefficients that 
produces the best figure of merit based on ~ 2000 itera- 
tions is considered optimal and used to for the analysis. 

After the event selection and applying ^-tagging, sev- 
eral of the sizable background processes do not have a b- 
quark in the final state, but are falsely identified as such. 
This happens either because a light quark jet is falsely 
identified to have a displaced secondary vertex from the 
primary vertex due to tracking resolution (mistag) or be- 
cause charm quark decays happen to have a sufficiently 
long lifetime to be tagged. Therefore, it would be desir- 
able to have better separation of 6-quark jets from charm 
or light quark jets. The neural network jet flavor sepa- 
rator is used to achieve this separation. As mentioned 
before, the b variable used in the EPD is a transforma- 
tion of the 6nn in such a way that it goes from to 1. The 
neural network jet flavor separator is a continuous vari- 
able and the result of a neural network training that uses 
a broad range of variables in order to identify 6-quark jets 
with high purity [57]. A variety of variables is suitable 
to exploit the lifetime, mass, and decay multiplicity of b- 
hadrons. Many of them are related to the reconstructed 
secondary vertex; some are reflected by the properties of 
the tracks in the SecVtx tagged jet. Including this fac- 
tor helps to discriminate signal from background events 
and improves the flnal sensitivity. 

The event probability discriminants are defined for 
all the MC events that pass the analysis selection (see 
Sect. IV) including events with at least one jet tagged by 
SecVtx. This provides sufficient MC statistics except 
for -|- LF and non-IF events, so in these cases events 
with no tagged jets are also included. 

The EPDs, for MC events, are defined independently 
of the tagging category of the event, but later on, when 
making the final templates, the events are weighted by 
the corresponding tagging probability. These tagging 
probabilities are the 6-tagging correction factor (etag) 
used in Eqs. 3 and 4. They are functions of the flavor of 
the quark, the tagging scale factor and the mistag matrix, 
a parametrization of the mistag rate. If a jet is matched 
to a heavy-flavor hadron (Ai?(jet, HF hadron) < 0.4) 
and tagged by one of the 6-tagging algorithms, the weight 
is the corresponding tagging scale factor (shown in Ta- 
ble II). If it is matched to a heavy flavor hadron but 
the jet is not tagged by any of the 6-tagging algorithms, 
the weight is set to zero. If the jet is not matched to 
heavy fiavor, it is assigned a weight equal to its mistag 
probability (Section VC), regardless of whether or not 
it was tagged, because the Monte Carlo simulation does 
not properly model mistagging. On the other hand, for 
observed events, tagging is required and the events are 
not weighted by any tagging probability. 

Since the neural network jet flavor separator &nn is 
deflned only for SecVtx tagged jets, it requires a spe- 
cial treatment for the events where any of the jets is 
not tagged. &nn is used for each type of event, in the 
cases where the jet is not tagged the value of the 6nn 



is randomized using the light or non-W flavor separator 

template. 

In the case of three-jet events (for two-jet events the 
same idea applies), for Eq. 13 (EPD for the SVJP and 
SVnoJP categories) the criteria for choosing b are: 

• if the three jets are SecVtx tagged, the 6-jet prob- 
ability of one of them is chosen randomly; 

• if two jets are SecVtx tagged, the 6-jet probability 
of one of them is chosen randomly; 

• if one jet is SecVtx tagged, the 6-jet probability 

of that jet is used; 

• if no jet is SecVtx tagged, the 6-jet probability is 
randomized (a random value is taken from the light 

flavor template for W + light events and from the 
non-H^ template for non-M^ events) for each of the 
3 jets and one of them is chosen randomly. 

For Eq. 14 (EPD for the SVSV category), the criteria for 
choosing bi and 62 are: 

• if the three jets are SecVtx tagged, the 6-jet prob- 
abilities of two of them are chosen randomly; 

• if two jets are SecVtx tagged, the 6-jet probability 
of both of them is used (in random order); 

• if one jet is SecVtx tagged, the 6-jet probability of 
the tagged jet and a random value out of the other 
jets are used (in random order); 

• if no jet is SecVtx tagged, the 6-jet probability of 
the three jets is randomized and two of them are 
randomly chosen. 

In the search for SM Higgs boson production, twelve 
separate EPD discriminants are created for each Higgs 
boson mass point, given by the different 6-tagging cat- 
egories (SVnoJP, SVJP, SVSV), the number of jets in 
the final state (2 and 3 jets), and the type of leptons 
(tight and EMC leptons). This gives the ability to tune 
the discriminants independently. Figures 20 and 21 show 
the signal and background templates, scaled to unit area, 
for two and three-jet events, respectively, for each signal 
region. Note that in these figures all of the lepton cate- 
gories have been combined. 

D. Validation of the discriminant output 

The performance of the Monte Carlo to predict the 
distribution of each EPD is validated by checking the 
untagged W^-)-jets control samples, setting 6nn = 0.5 so 
that it does not affect the EPD. An example is shown 
in Fig. 22, for M^-|-2-jet and M^-|-3-jet events. The agree- 
ment in this control sample gives confidence that the in- 
formation used in this analysis is well modeled by the 
Monte Carlo simulation. 
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FIG. 20: Templates of predictions for the signal {mn ~ 115 GeV/c^) 
the ME discriminant, EPD, for 2-jet events for each signal region. 
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FIG. 21: Templates of predictions for the signal {niH ~ 115 GeV/c^) 
the ME discriminant, EPD, for 3-jet events for each signal region. 



and background processes, each scaled to unit area, of 



The ME method used here is further vahdated through 
its successful use in previous analyses at the CDF exper- 
iment to observe small signals with large backgrounds in 
similar final states to the one used here for the Higgs 
boson search. The method was used in the untagged 
VF+jet sample to measure the cross section of diboson 
production [17]. In addition, it was used successfully in 
the tagged sample to measure the single top production 
cross section [37]. In the latter, the modeling was also 
checked for the discriminant output for a second control 
region - events with four jets. In this sample, dominated 
by top pair production, the EPD was also found to be 
well modeled [541 . 



IX. SYSTEMATIC UNCERTAINTIES 

Systematic uncertainties can bias the outcome of this 
analysis and have to be incorporated into the result. The 
dominant systematic uncertainties addressed are from 
several different sources: jet energy scale (JES), initial 
state radiation (ISR) , final state radiation (FSR) , parton 
distribution functions, lepton identification, luminosity, 
and 5-tagging scale factors. 

Systematic uncertainties can influence both the ex- 
pected event yield (normalization) and the shape of the 



discriminant distribution. The dominant rate uncertain- 
ties have been included for each category. Shape uncer- 
tainties have only been applied for the JES, which has a 
small impact on the final sensitivity. Other shape uncer- 
tainties are expected to be small. When the sensitivity to 
signal events gets closer to the SM prediction the result 
will be more affected by sources of systematic uncertain- 
ties; currently, this analysis is statistically limited. 

Normalization uncertainties are estimated by recalcu- 
lating the acceptance using Monte Carlo samples altered 
due to a specific systematic effect. The normalization 
uncertainty is the difference between the systematically 
shifted acceptance and the default one. The normaliza- 
tion uncertainties for signal and background processes 
are shown in Tables IX (for two-jet events) and X (for 
three-jet events) ^. 

The effect of the uncertainty in the jet energy scale 



^ Note that empty entries in the table either mean that the sys- 
tematic is not relevant for that process (for example background 
rates that are derived from data are not affected by the uncer- 
tainty on the luminosity measurement), or that it was studied 
and found to be negligible (for example effect of the JES uncer- 
tainty was studied for dibosons and top production and found to 
have a negligible impact on the final sensitivity). 
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FIG. 22: Left (right): The discriminant output for untagged W+two (three) jets control sample show that the Monte Carlo 
W+ two (three) jets samples model the ME distribution of the observed events well. 



TABLE IX: Normalization systematic uncertainties on the signal and background contributions for the 2 jets channel. Some 
uncertainties are listed as ranges, as the impacts of the uncertain parameters depend on the tagging category. Systematic 
uncertainties for WH shown in this table are obtained for niH = 115 GeV/c^. 



Relative Uncertainties (%) 
Contribution W^+HF Mistags Tbp Diboson Non-W WH 



Luminosity (o-inci(pp)) 3.8 3.8 3.8 

Luminosity monitor 4.4 4.4 4.4 

Lepton ID 2 2 2 

Jet energy scale 2 

ISR+FSR+PDF 3.1-5.6 

&-tag efficiency 3.5-8.4 3.5-8.4 3.5-8.4 

Cross section 10 10 10 
HE fraction in W-fjets 30 

Mistag rate 9-13.3 

Non-W^ rate 40 



is evaluated by applying jet-energy corrections that de- 
scribe ±1(7 variations to the default correction factor. 
The JES shape uncertainty has been only applied to the 
event probability discriminant for the two and three jet 
events in the samples with the biggest contribution, for 
the WH signal sample, and the W + jets and tt back- 
ground samples. Shape variations due to the jet energy 
scale for two and three jet WH signal events are shown 
in Fig. 23. The effect of the JES shape uncertainty on 
the final sensitivity is small, on the order of only a few 
percent. This is small compared to the effect of normal- 
ization uncertainties. 

Systematic uncertainties due to the modeling of ISR 
and FSR are obtained from dedicated Monte Carlo sam- 
ples for WH signal events where the strength of ISR/FSR 
is increased and decreased in the parton showering to 
represent ilcr variations [58]. The effects of variations 
in ISR and FSR are treated as 100% correlated with each 
other. 

To evaluate the uncertainty on the signal acceptance 
associated with the specific choice of parton distribu- 



tion functions, events are reweighted based on different 
PDF schemes. The twenty independent eigenvectors of 
the CTEQ [42] PDFs are varied and compared to the 
MRST [59] PDFs. The uncertainty from the CTEQ and 
MRST PDF uncertainty are summed in quadrature if the 
difference between the CTEQ and MRST PDFs is larger 
than the CTEQ uncertainty. 

The estimate of the lepton ID uncertainty is a result of 
varying the lepton ID correction factors. The results are 
then compared to the nominal prediction for an estimate 
of the fractional uncertainty. All lepton ID correction 
factors are varied either all up or all down simultane- 
ously. The yield is then calculated for each sample and 
compared to the nominal prediction. The lepton ID un- 
certainty is applied to the signal sample and all Monte 
Carlo based samples. 

For the signal sample and all Monte Carlo based sam- 
ples a systematic uncertainty is applied for the uncer- 
tainty in the CDF luminosity measurement which is cor- 
related across all samples and channels. This uncertainty 
includes the uncertainty in the pp inelastic cross section 
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TABLE X: Normalization systematic uncertainties on the signal and background contributions for the 3 jets channel. Some 
uncertainties are listed as ranges, as the impacts of the uncertain parameters depend on the tagging category. Systematic 
uncertainties for WH shown in this table are obtained for mn = 115 GeV/c^. 



Relative Uncertainties (%) 



Contribution W+HF Mistags Top Diboson Non-W WH 

Luminosity {cnnci{pp)) 3.8 3.8 3.8 

Luminosity monitor 4.4 4.4 4.4 

Lepton ID 2 2 2 

Jet energy scale 13.5-15.8 

ISR+FSR+PDF 13.1-21.4 

6-tag efficiency 3.5-8.4 3.5-8.4 3.5-8.4 

Cross section 10 10 10 

HF fraction in W-|-jets 30 

Mistag rate 9-13.3 

Non-VK rate 40 
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FIG. 23: Top (bottom): WH {mn = 115 GeV/c^) JES shape 
systematic for two (three) jet events. The plots show the 
relative difference of one a up and one a down jet energy 
correction with respect to the nominal correction. 



(3.8%) as well as the uncertainty in the acceptance of 
CDF's luminosity monitor (4.4%) [60]. 

The effect of the 6-tagging scale factor uncertainty is 
determined from the background estimate. The system- 
atic uncertainty on the event tagging efficiency is esti- 



mated by varying the tagging scale factor and mistag 
prediction by ±lcr and calculating the difference between 
the systematically shifted acceptance and the default one. 

For all background processes the normalization uncer- 
tainties are represented by the uncertainty on the pre- 
dicted number of background events and are incorporated 
in the analysis as Gaussian constraints G(/3j|l, Aj) in a 
likelihood function [37]. The systematic uncertainties in 
the normalizations of each source, /3j, are incorporated 
into the likelihood as nuisance parameters, conforming 
with a fully Bayesian treatment [61]. The correlations 
between normalizations for a given source are taken into 
account. The likelihood function is marginalized by in- 
tegrating over all nuisance parameters for many possible 
values of the WH cross section /?i = Pwh- The re- 
sulting reduced likelihood £{I3wh) is a function of the 
WH cross section (3wh only. More details on the sta- 
tistical treatment of the limit calculation are included in 
Refs. [37, 52]. 



X. RESULTS 

The analysis is applied to observed events in a sample 
corresponding to an integrated luminosity of 5.6 fb~^. 
The EPD output distribution, for a Higgs boson mass of 
115 GeV/c^, of our candidate events is compared with 
the sum of predicted WH signal and background distri- 
butions as shown in Fig. 24. 

We search for an excess of Higgs boson signal events 
in the EPD distributions, but no evidence of a signal 
excess is found in the observed events. Thus, we per- 
form a binned likelihood fit to the EPD output distribu- 
tions to set an upper limit on SM Higgs boson produc- 
tion associated with a W boson for eleven values of uih, 
100 <mH < 150 GeV/c2 in 5 GeV/c^ steps. 

In order to extract the most probable WH signal 
content in the observed events the maximum likelihood 
method described before is performed. A marginalization 
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FIG. 24: Top (bottom): Comparison of the EPD output for lepton + 2 (3) jets observed events compared to the Monte Carlo 
simulated events for WH {mH = 115 GeV/c^) signal and background. From left to right: SVnoJP, SVJP, and SVSV tagged 
observed events, respectively. Note that the signal is twice in these plots, as a stacked plot and as a histogram multiplied by 5 
(x5). 



using the likelihood function is performed with all sys- 
tematic uncertainties included in the likelihood function. 
The posterior p.d.f is obtained by using Bayes' theorem: 



(n |pp„^ ^ C*{EPD\l3wH)n{PwH) 
^^^'^ ' !C*[EPD\l3'^MP'wH)dli'wH 

where C*{EPD\Pwh) is the reduced likelihood and 
T^iPwH) is the prior p.d.f. for (3wh- A flat prior is 
adopted, t:{I3wh) = H{(3wh), in this analysis, with H 
being the Heaviside step function. To set an upper limit 
on the WH production cross section, the posterior prob- 
ability density is integrated to cover 95% [52]. 

The observed and expected limits on 
(t(j)p — >■ WH) X B{H — >• 66), for each Higgs 
boson mass point from 100 to 150 GeV/c^ in 5 GeV/c^ 
steps, all 6-tagging categories, and 2- and 3-jet events 
together are shown in Table XI and in Fig. 25. The 
observed and expected limits in SM cross section units 
are shown in Table XIL 

Tables XIII and XIV show the expected and observed 
limits, for each Higgs boson mass point, for events with 2 
and 3 jets, respectively. Including 3 jet events improves 
the limit by 3 to 10%, depending on the Higgs boson 



TABLE XI: Expected and observed upper limit cross sections, 
relative to the SM prediction, for different Higgs boson mass 
points for 2- and 3-jet events. 









2, 


3 jets 












(7 / SM 100 


105 


110 


115 


120 


125 


130 


135 


140 


145 


150 


Expected 2.5 


2.7 


3.0 


3.5 


4.4 


5.1 


6.6 


8.7 


13.0 


17.8 


27.5 


Observed 2.1 


2.6 


3.2 


3.6 


4.6 


5.3 


8.3 


9.2 


14.8 


18.9 


35.3 



TABLE XII: Expected and observed upper limit on 
cy{pp — WH) X B{H — > hb) in units of pb for different 
Higgs boson mass points for 2- and 3-jet events. 



2, 3 jets 


(7 100 105 


110 


115 120 125 


130 135 


140 


145 


150 


Exp 0.72 0.68 


0.66 


0.65 0.67 0.69 


0.79 0.90 


1.12 


1.39 


1.93 


Obs 0.60 0.66 


0.70 


0.67 0.70 0.72 


1.00 0.95 


1.27 


1.47 


2.47 



mass, with respect to the result using 2 jet events only. 
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FIG. 25: 95% C.L. upper limits on the WH production cross 
sections times branching ratio for H ^ bb for Higgs boson 
masses between mn = 100 GeV/c^ to mn = 150 GeV/c^. 
The plot shows the limit normalized to the cross section pre- 
dictions from the standard model. 



TABLE XIII: Expected and observed upper hmit cross sec- 
tions, relative to the SM prediction, for different Higgs boson 
mass points in the 2 jets channel. 



2 jets 


a / SM 100 


105 


110 


115 


120 


125 


130 


135 


140 


145 


150 


Expected 2.6 


2.8 


3.2 


3.7 


4.7 


5.5 


7.1 


9.5 


14.2 


19.7 


30.7 


Observed 2.7 


3.3 


3.7 


4.5 


5.9 


6.8 


9.6 


12.0 


19.3 


24.0 


43.2 



XI. CONCLUSIONS 

A search for the Higgs boson production in associa- 
tion with a W boson using a matrix element technique 
has been performed using 5.6 fb~^ of CDF data. A 
maximum likelihood technique has been applied to ex- 
tract the most probable WH content in observed events. 
No evidence is observed for a Higgs boson signal us- 
ing observed events corresponding to an integrated lu- 
minosity of 5.6 fb and 95% confidence level upper lim- 
its are set. The limits on the WH production cross 
section times the branching ratio, relative to the SM 
prediction, of the Higgs boson to decay to bb pairs are 
a{pp WH) X B{H bb)/SM < 2.1 to 35.3 
for Higgs boson masses between mn = 100 GeV/c^ 
and niH = 150 GeV/c^. The expected (me- 
dian) sensitivity estimated in pseudoexperiments is 
a{pp WH) X B{H bb)/SM < 2.5 to 27.5 

at 95% C.L. 

The search results in this channel at the CDF ex- 
periment are the most sensitive low-mass Higgs boson 
search at the Tevatron. While the LHC experiments 
will soon have superior sensitivity to the low-mass Higgs 
boson, this sensitivity comes primarily from searches in 
the diphoton final state. Therefore, we expect that the 



TABLE XIV: Expected and observed upper limit cross sec- 
tions, relative to the SM prediction, for different Higgs boson 
mass points in the 3 jets channel. 











3 jet 


3 










a / SM 100 


105 


110 


115 


120 


125 


130 


135 


140 145 


150 


Exp. 12.2 


12.9 


13.9 


15.8 


19.5 


23.0 


28.1 


39.5 


56.1 77.9 


120 


Obs. 5.1 


5.6 


8.6 


8.5 


10.8 


12.4 


17.3 


22.9 


33.7 42.5 


81 



searches in the H ^ bb at the Tevatron will provide 
crucial information on the existence and nature of the 
low-mass Higgs boson for years to come. 
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